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DETAILED ACTION 

Claim Rejections - 35 USC § 103 

1 . The following is a quotation of 35 U.S.C. 1 03(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

2. Claims 1 to 3, 5 to 7, and 9 to 11 rejected under 35 U.S.C. 1 03(a) as being 
unpatentable over Morris in view of McMullan, Jr et al. 

Concerning independent claims 1, 5, and 9, Moms discloses a method, device, 
and system for speech recognition, comprising: 

"receiving audio signals from a speech source" - system 100 captures any 
speech with speech input unit 104 (column 4, lines 15 to 19: Figures 1 and 2: Steps 202 
and 204); 

"receiving video signals from the speech source" - system 100 captures the 
user's image with video input unit 102 (column 4, lines 15 to 19: Figures 1 and 2: Steps 
202 and 204); 

"converting at least one of the audio signals and the video signals into 
recognizable information" - system 100 interprets any verbal input using the speech 
recognition functions of multi-sensor fusion/recognition system 106; the speech 
recognition is supplemented by the visual information captured by video input unit 102, 
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such as any interpreted facial expressions (e.g., lip-reading) (column 4, lines 25 to 31 : 
Figures 1 and 2: Step 206); 

"implementing a task based on the recognizable information" - system 100 
searches knowledge database 1 16, or additional resources such as the Internet, for a 
response to an objective question (column 5, lines 8 to 13: Figures 1 and 3: Step 310). 

Concerning independent claims 1, 5, and 9, /Woms discloses one embodiment 
where only image data from the video unit 102 is used, and the audio data from the 
speech input unit 104 is ignored for recognition purposes. (Column 3, Lines 24 to 27) 
Morris omits "detecting if the audio signals can be processed, wherein detecting if the 
audio signals can be processed comprises defining an error threshold, comparing a 
number of errors detected in the audio signal with the threshold, and determining that 
the audio signals can not be processed if the number of detected errors equals or 
exceeds a threshold" and "processing the audio signal If It Is detected that the audio 
signals can be processed, and processing the video signals based on a detection that at 
least a portion of the audio signal cannot be processed". However, McMullan, Jr. et al. 
teaches a digital audio muting system and method, where an error detection system has 
a programmable error sensitivity, and a digital audio data muting circuit mutes a digital 
audio transmission system when a large number of errors per unit time are detected In 
the received digital audio system. (Column 1, Lines 6 to 14) An error limit register 
stores a predetermined threshold value and a comparator compares a sum output of a 
counter with the predetermined threshold value from the error limit register. A disable 
signal is output when the sum of the counter exceeds the predetermined threshold. 
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(Column 4, Lines 5 to 12) Tlie error muting system is used in conjunction with error 
correction in digital audio, video, speech, or other kinds of digital signal transmission 
systems to provide predictable error muting when the received digital signal has 
numerous data errors which cannot be corrected. (Column 2, Line 63 to Column 3, Line 
5) Thus, McMullan, Jr. et al. provides for muting the audio when the number of errors 
are too high, so that only the video would be processed when it is determined that the 
audio cannot be processed. It would have been obvious to one having ordinary skill in 
the art to employ the error processing/muting system and method of McMullan, Jr. et al. 
in a multi-sensor fusion/recognition unit of Morris for a purpose of providing error muting 
when a received digital signal has errors which cannot be corrected. 

Concerning claims 2, 6, and 10, Morris discloses that the speech recognition is 
supplemented by the visual information captured by video input unit 102, such as any 
interpreted facial expressions (e.g., lip-reading) ("video images of lip movements that 
coincide with the audio signals") (column 4, lines 27 to 31: Figures 1 and 2). 

Concerning claims 3, 7, and 1 1 , Morris discloses that multi-sensor 
fusion/recognition unit 106 receives image data and audio input at the same time 
(column 2, line 66 to column 3, line 9: Figure 1); implicitly, these signals "coincide" and 
are received "in parallel". 

3. Claims 4, 8, and 12 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over Morris in view of McMullan, Jr. et al. as applied to claims 1 , 5, and 9 above, and 
further in view of Bakis et al. 
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Morris does not expressly disclose a storage unit for storing the audio signals 
and the video signals to a destination source, and a transmitter for sending the audio 
signals and the video signals to a destination source. However, it is well known to 
operate biometric identification via a client/server network, where biometric data is 
stored on a server, and biometric data is collected locally but compared to stored 
biometric data on the server. Bakis et al. teaches an analogous art method and 
apparatus for recognizing the identity of individuals by a speaker recognition system 
and a lip classifier, where biometric attributes are pre-stored for later retrieval so that 
they may be compared. Further, a server is included for interfacing with a plurality of 
biometric recognition systems to receive requests for biometric attributes therefrom and 
transmit biometric attributes thereto. The server has a memory device for storing the 
biometric attributes. (Column 8, Line 47 to Column 9, Line 16) Objectives are to 
provide a significant Increase In the degree of accuracy of recognition and to provide a 
significant reduction in fraudulent or errant access to a service and/or facility. (Column 
2, Lines 50 to 56) It would have been obvious to one having ordinary skill in the art to 
store and send biometric attributes to a server ("a destination source") as taught by 
Bakis et al. in a method, device, and system for combining audio and video signals of 
Morris for purposes of increasing accuracy of recognition and reducing fraudulent 
access. 
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4. Claims 22 to 25 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Morris in view of McMullan, Jr et al. as applied to claims 1 and 5 above, and further in 
view of Suomela et al. 

Morris omits any description of the computer system for performing the multi- 
sensor fusion recognition as a laptop computer, a home computer, or a mobile phone, 
but it is well known that various types of computer systems are applicable for speech 
recognition and image processing. Specifically, Suomela etal. teaches a method for 
speech recognition, where a terminal may be a mobile phone, a desktop, or a laptop 
computer. (1f[0023]) An objective is to provide speech as an input to a terminal of an 
electronic device so that a user can operate the device while performing other tasks 
such as walking or driving a motor vehicle. (1|[0004]) It would have been obvious to 
one having ordinary skill in the art to implement the multi-sensor fusion recognition of 
Morris on a laptop computer or mobile phone as taught by Suomela et al. for a purpose 
of permitting a user to perform additional tasks while operating the device. 

Allowable Subject Matter 

5. Claims 19 to 21 and 26 to 27 are allowed. 

Response to Arguments 

6. Applicant's arguments filed 12 January 2009 have been considered but are moot 
in view of the new grounds of rejection. 
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Conclusion 

7. The prior art made of record and not relied upon is considered pertinent to 
Applicant's disclosure. 

Lee at al. suggests a camera that communicates to the user with a red light that 
a user needs to be more distant from the camera and with a green light that the user 
needs to come nearer to the camera. 

Hollier et al., Kiessling et al., Fujimura et al., Yoshizawa et al., Durand, and 
Biondo, Jr. disclose related art. 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to MARTIN LERNER whose telephone number is 
(571)272-7608. The examiner can normally be reached on 8:30 AM to 6:00 PM 
Monday to Thursday. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, David R. Hudspeth can be reached on (571 ) 272-7843. The fax phone 
number for the organization where this application or proceeding is assigned is 571- 
273-8300. 

Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
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you have questions on access to tine Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 
USPTO Customer Service Representative or access to the automated information 
system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 



/Martin Lerner/ 
Primary Examiner 
Art Unit 2626 
March 4, 2009 



